-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[16192] fix notification lost #3087
[16192] fix notification lost #3087
Conversation
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me.
@iuhilnehc-ynos Thank you for your contribution. Changes LGTM, but we'll need to add a regression test before merging this. |
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
I have added a regression test. The test log shows that I need 176070(a random number) times to run the test to output an error in my local machine if not applying the patch.
Do you have any suggestions about the regression test? |
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
not really, sorry 😢 |
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
Signed-off-by: Chen Lihui <lihui.chen@sony.com>
@richiprosima Please test this |
@iuhilnehc-ynos It seems this PR is introducing possible deadlocks. At least that is what the TSan job is complaining about. You can get the report from here. |
Signed-off-by: Miguel Company <MiguelCompany@eprosima.com>
@iuhilnehc-ynos I think I fixed the deadlock on 8c60364. Let's wait for the TSan job. |
@richiprosima Please test this |
I just analyzed the log, but you have already fixed the potential deadlock. Thank you. |
@MiguelBarro This has a deadlock report on code related with ReadConditions. Would you mind taking a look? |
The fix consists on assuring no notification will take place on predicate evaluation by taken the mutex during notifications (
A workaround may be to introduce an |
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
Signed-off-by: Miguel Barro <miguelbarro@eprosima.com>
@MiguelCompany @MiguelBarro anything we need to do on our side? friendly ping. |
friendly ping, this can resolve a couple of issues. |
@richiprosima Please test this |
@richiprosima Please test windows |
@Mergifyio backport 2.8.x 2.7.x 2.6.x |
✅ Backports have been created
|
I am sorry about the comment ros2/rmw_fastrtps#650 (comment). If using 34a8f8d, the I was thinking you will use #3098, which can re-check the condition every 500ms. |
@iuhilnehc-ynos Thank you for checking with 34a8f8d. We will then update this PR to aa260da and close #3098. Would you mind checking ros2/rmw_fastrtps#650 with aa260da to verify it solves the issue? |
@richiprosima Please test this |
@richiprosima Please test Linux |
I believe this PR with aa260da can fix the
|
* fix notification lost Signed-off-by: Chen Lihui <lihui.chen@sony.com> * add a regression test Signed-off-by: Chen Lihui <lihui.chen@sony.com> * rename a variable name and update comments Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify issue Signed-off-by: Chen Lihui <lihui.chen@sony.com> * make the regression test better Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify Signed-off-by: Chen Lihui <lihui.chen@sony.com> * Refs #16192. Fix deadlock on WaitSetImpl. Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> * Refs 16192. Workaround for deadlock and notify pessimization Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Mandatory logic fix Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. linter Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs #16192. Catch up any notification lost Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Avoid predicate evaluation on spurious wakeups. Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Fixing gcc warnings Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Allowing wait without previous notification Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Signed-off-by: Chen Lihui <lihui.chen@sony.com> Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Co-authored-by: Miguel Company <miguelcompany@eprosima.com> Co-authored-by: Miguel Barro <miguelbarro@eprosima.com> (cherry picked from commit a9e49cd)
* fix notification lost Signed-off-by: Chen Lihui <lihui.chen@sony.com> * add a regression test Signed-off-by: Chen Lihui <lihui.chen@sony.com> * rename a variable name and update comments Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify issue Signed-off-by: Chen Lihui <lihui.chen@sony.com> * make the regression test better Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify Signed-off-by: Chen Lihui <lihui.chen@sony.com> * Refs #16192. Fix deadlock on WaitSetImpl. Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> * Refs 16192. Workaround for deadlock and notify pessimization Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Mandatory logic fix Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. linter Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs #16192. Catch up any notification lost Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Avoid predicate evaluation on spurious wakeups. Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Fixing gcc warnings Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Allowing wait without previous notification Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Signed-off-by: Chen Lihui <lihui.chen@sony.com> Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Co-authored-by: Miguel Company <miguelcompany@eprosima.com> Co-authored-by: Miguel Barro <miguelbarro@eprosima.com> (cherry picked from commit a9e49cd)
* fix notification lost Signed-off-by: Chen Lihui <lihui.chen@sony.com> * add a regression test Signed-off-by: Chen Lihui <lihui.chen@sony.com> * rename a variable name and update comments Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify issue Signed-off-by: Chen Lihui <lihui.chen@sony.com> * make the regression test better Signed-off-by: Chen Lihui <lihui.chen@sony.com> * fix uncrustify Signed-off-by: Chen Lihui <lihui.chen@sony.com> * Refs #16192. Fix deadlock on WaitSetImpl. Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> * Refs 16192. Workaround for deadlock and notify pessimization Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Mandatory logic fix Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. linter Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs #16192. Catch up any notification lost Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Avoid predicate evaluation on spurious wakeups. Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Fixing gcc warnings Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> * Refs 16192. Allowing wait without previous notification Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Signed-off-by: Chen Lihui <lihui.chen@sony.com> Signed-off-by: Miguel Company <MiguelCompany@eprosima.com> Signed-off-by: Miguel Barro <miguelbarro@eprosima.com> Co-authored-by: Miguel Company <miguelcompany@eprosima.com> Co-authored-by: Miguel Barro <miguelbarro@eprosima.com> (cherry picked from commit a9e49cd)
This reverts commit a9e49cd.
@iuhilnehc-ynos The browser showed me your commit after merging. We're reverting the merge on #3193. I thought about this a bit more during the weekend, and decided a new approach, based on your original solution. After the revert is done, I will create a new PR with the changes on this branch. Could you check aginst that branch? |
It seems there is some compilation issue on that branch. Maybe you are working on it, I'll check it based on your new PR. |
@iuhilnehc-ynos Yeah, sorry about that. It should be fixed now. I have also created #3195 with the backport to 2.6.x, so we can have the fix on Humble too. |
Signed-off-by: Chen Lihui lihui.chen@sony.com
Description
Please refer to ros2/rclcpp#2039 (comment).
Contributor Checklist
versions.md
file (if applicable).Reviewer Checklist